26 research outputs found
Level Playing Field for Million Scale Face Recognition
Face recognition has the perception of a solved problem, however when tested
at the million-scale exhibits dramatic variation in accuracies across the
different algorithms. Are the algorithms very different? Is access to good/big
training data their secret weapon? Where should face recognition improve? To
address those questions, we created a benchmark, MF2, that requires all
algorithms to be trained on same data, and tested at the million scale. MF2 is
a public large-scale set with 672K identities and 4.7M photos created with the
goal to level playing field for large scale face recognition. We contrast our
results with findings from the other two large-scale benchmarks MegaFace
Challenge and MS-Celebs-1M where groups were allowed to train on any
private/public/big/small set. Some key discoveries: 1) algorithms, trained on
MF2, were able to achieve state of the art and comparable results to algorithms
trained on massive private sets, 2) some outperformed themselves once trained
on MF2, 3) invariance to aging suffers from low accuracies as in MegaFace,
identifying the need for larger age variations possibly within identities or
adjustment of algorithms in future testings
Soccer on Your Tabletop
We present a system that transforms a monocular video of a soccer game into a
moving 3D reconstruction, in which the players and field can be rendered
interactively with a 3D viewer or through an Augmented Reality device. At the
heart of our paper is an approach to estimate the depth map of each player,
using a CNN that is trained on 3D player data extracted from soccer video
games. We compare with state of the art body pose and depth estimation
techniques, and show results on both synthetic ground truth benchmarks, and
real YouTube soccer footage.Comment: CVPR'18. Project: http://grail.cs.washington.edu/projects/soccer
DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion
We present DreamPose, a diffusion-based method for generating animated
fashion videos from still images. Given an image and a sequence of human body
poses, our method synthesizes a video containing both human and fabric motion.
To achieve this, we transform a pretrained text-to-image model (Stable
Diffusion) into a pose-and-image guided video synthesis model, using a novel
finetuning strategy, a set of architectural changes to support the added
conditioning signals, and techniques to encourage temporal consistency. We
fine-tune on a collection of fashion videos from the UBC Fashion dataset. We
evaluate our method on a variety of clothing styles and poses, and demonstrate
that our method produces state-of-the-art results on fashion video animation.
Video results are available on our project page.Comment: Project page: https://grail.cs.washington.edu/projects/dreampose